Parallel LU Factorization on GPU Cluster

نویسندگان
چکیده

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Parallel LU Factorization on GPU Cluster

This paper describes our progress in developing software for performing parallel LU factorization of a large dense matrix on a GPU cluster. Three approaches, with increasing software complexity, are considered: (i) a naive “thunking” approach that links the existing parallel ScaLAPACK software library with cuBLAS through a software emulation layer; (ii) a more intrusive magmaBLAS implementation...

متن کامل

Lu Factorization on Parallel Computers

Abstract-A new parallel algorithm for the LU factorization of a given dense matrix A is described. The case of banded matrices is also considered. This algorithm can be combined with Sameh and Brent’s [SIAM J. Numer. Anal. 14, 1101-I 113. (1977)] to obtain the solution of a linear system of algebraic equations. The arithmetic complexity for the dense case is in’ ($bn in the banded case), using ...

متن کامل

Parallel Graph Coloring with Applications to the Incomplete-LU Factorization on the GPU

In this technical report we study different parallel graph coloring algorithms and their application to the incomplete-LU factorization. We implement graph coloring based on different heuristics and showcase their performance on the GPU. We also present a comprehensive comparison of level-scheduling and graph coloring approaches for the incomplete-LU factorization and triangular solve. We discu...

متن کامل

Parallel Incomplete-LU and Cholesky Factorization in the Preconditioned Iterative Methods on the GPU

A novel algorithm for computing the incomplete-LU and Cholesky factorization with 0 fill-in on a graphics processing unit (GPU) is proposed. It implements the incomplete factorization of the given matrix in two phases. First, the symbolic analysis phase builds a dependency graph based on the matrix sparsity pattern and groups the independent rows into levels. Second, the numerical factorization...

متن کامل

Fine-Grained Parallel Incomplete LU Factorization

This paper presents a new fine-grained parallel algorithm for computing an incomplete LU factorization. All nonzeros in the incomplete factors can be computed in parallel and asynchronously, using one or more sweeps that iteratively improve the accuracy of the factorization. Unlike existing parallel algorithms, the new algorithm does not depend on reordering the matrix. Numerical tests show tha...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Procedia Computer Science

سال: 2012

ISSN: 1877-0509

DOI: 10.1016/j.procs.2012.04.008